EM training of finite-state transducers and its application to pronunciation modeling
نویسندگان
چکیده
Recently, finite-state transducers (FSTs) have been shown to be useful for a number of applications in speech and language processing. FST operations such as composition, determinization, and minimization make manipulating FSTs very simple. In this paper, we present a method to learn weights for arbitrary FSTs using the EM algorithm. We show that this FST EM algorithm is able to learn pronunciation weights that improve the word error rate for a spontaneous speech recognition task.
منابع مشابه
Em Training of Finite-sta and Its Application to Pronu
Recently, finite-state transducers (FSTs) have been shown to be useful for a number of applications in speech and language processing. FST operations such as composition, determinization, and minimization make manipulating FSTs very simple. In this paper, we present a method to learn weights for arbitrary FSTs using the EM algorithm. We show that this FST EM algorithm is able to learn pronuncia...
متن کاملOn the Road to Improved Lexical Confusability Metrics
Pronunciation modeling in automatic speech recognition systems has had mixed results in the past; one likely reason for poor performance is the increased confusability in the lexicon from adding new pronunciation variants. In this work, we propose a new framework for determining lexically confusable words based on inverted finite state transducers (FSTs); we also present experiments designed to...
متن کاملExplicit Modeling of Phonological Changes in Finite-state Transducer Based Hungarian Lvcsr
This article describes the operation and the experimental evaluation of the pronunciation modeling component of the first Hungarian large vocabulary continuous speech recognition system. The proposed method is based on the implementation of context dependent rewrite rules by weighted finite state transducers (WFSTs). The proposed phonological model decreases the error rate by 8.32% relatively c...
متن کاملPronunciation modeling using a finite-state transducer representation
The MIT SUMMIT speech recognition system models pronunciation using a phonemic baseform dictionary along with rewrite rules for modeling phonological variation and multi-word reductions. Each pronunciation component is encoded within a finitestate transducer (FST) representation whose transition weights can be probabilistically trained using a modified EM algorithm for finite-state networks. Th...
متن کاملDiscriminative training of WFST factors with application to pronunciation modeling
One of the most popular speech recognition architectures consists of multiple components (like the acoustic, pronunciation and language models) that are modeled as weighted finite state transducer (WFST) factors in a cascade. These factor WFSTs are typically trained in isolation and combined efficiently for decoding. Recent work has explored jointly estimating parameters for these models using ...
متن کامل